A Novel Corpus of Children’s Disordered Speech
نویسندگان
چکیده
This paper introduces the acquisition, evaluation and baseline Automatic Speech Recognition (ASR) experiments of a novel corpus containing speech from a set of impaired and unimpaired young speakers. A group of 14 speakers with different speech disorders have uttered several sessions over a 57-word vocabulary in Spanish to gather more than 3 hours of speech. In addition to this work, a parallel corpus of speech from unimpaired young speakers has been recorded with more than 6 hours of speech with the same vocabulary. The impaired speech corpus has been evaluated through a manual labeling to detect the mispronunciations made by the speakers, and the outcome of this work show that 17.31% of the phonemes have been either mispronounced or deleted in an isolated work task. A baseline evaluation of the performance of an state-of-the-art ASR system shows a 35.02% of Word Error Rate (WER) when using Speaker Independent models based on adult speech. This WER is reduced to 27.60% using models based on children speech and further reduced to 15.35% using speaker dependent models. Finally, experiments on connected speech show how ASR performance degrades on 4 impaired speakers on the transition from isolated words to connected speech due to the language impairments of the speakers and the coarticulation in connected speech.
منابع مشابه
Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech
This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children’s speech are increasingly available, recognition of disordered children’s speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain ...
متن کاملOn the development of matched and mismatched Italian children's speech recognition systems
While at least read speech corpora are available for Italian children’s speech research, there exist many languages which completely lack children’s speech corpora. We propose that learning statistical mappings between the adult and child acoustic space using existing adult/children corpora may provide a future direction for generating children’s models for such data deficient languages. In thi...
متن کاملRecent advances in sonic Italian children2s speech recognition for interactive literacy tutors
Recent advances in SONIC Italian children’s speech recognition will be described. This work, completing a previous one developed in the past, was conducted with the specific goals of integrating the newly trained children’s speech recognition models into the Italian version of the Colorado Literacy Tutor platform. Specifically, children’s speech recognition research for Italian was conducted us...
متن کاملItalian children's speech recognition for advanced interactive literacy tutors
This work was conducted with the specific goals of developing improved recognition of children’s speech in Italian and the integration of the children’s speech recognition models into the Italian version of the Colorado Literacy Tutor platform. Specifically, children’s speech recognition research for Italian was conducted using the ITC-irst Children’s Speech Corpus. Using the University of Colo...
متن کاملAutomated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns
A proof of concept system is developed to provide a broad assessment of speech development issues in children. It has been designed to enable non-experts to complete an initial screening of children’s speech with the aim of reducing the workload on Speech Language Pathology services. The system was composed of an acoustic model trained by neural networks with split temporal context features and...
متن کامل